INDEX
Explanations
the word "almost" with activation values of 9 or 10
New Auto-Interp
Negative Logits
oran
-1.21
agate
-1.10
oris
-1.08
erion
-1.04
eria
-1.03
alam
-0.99
è£ıè¦ļéĨĴ
-0.97
osis
-0.97
oland
-0.94
achus
-0.94
POSITIVE LOGITS
stress
1.10
mundane
1.07
certainly
1.04
zero
0.95
identical
0.94
rito
0.91
exclusively
0.90
instinct
0.88
arser
0.88
lex
0.87
Activations Density 0.498%