INDEX
Explanations
the word "Lug" along with various other similar words with different activation strengths
references to specific locations and languages
New Auto-Interp
Negative Logits
20439
-0.72
GAME
-0.69
STATS
-0.68
éĹ
-0.67
concess
-0.66
DC
-0.65
yson
-0.65
kins
-0.64
Jed
-0.64
CEPT
-0.63
POSITIVE LOGITS
Lug
0.97
rador
0.96
ansk
0.94
rats
0.92
enger
0.90
uing
0.90
roup
0.87
ues
0.86
emort
0.85
er
0.85
Activations Density 0.016%