INDEX
Explanations
knowledge and learning origins
New Auto-Interp
Negative Logits
pistola
0.89
bombing
0.84
ransomware
0.80
nomi
0.79
tên
0.78
gunfire
0.76
titanium
0.75
nome
0.73
botched
0.73
ponytail
0.72
POSITIVE LOGITS
Knowledge
0.95
Learning
0.93
జ్ఞ
0.93
創造
0.91
Stim
0.89
Creativity
0.89
knowledge
0.86
Representation
0.86
Faculty
0.86
ज्ञान
0.85
Activations Density 0.001%