INDEX
Explanations
instances of the substring "il"
New Auto-Interp
Negative Logits
ĸļ
-0.80
-+-+
-0.69
©¶æ¥µ
-0.69
ĪĴ
-0.65
olicy
-0.65
Ħ¢
-0.64
nect
-0.63
Ruler
-0.63
Citation
-0.60
Winds
-0.57
POSITIVE LOGITS
aments
0.95
Nadu
0.91
ibr
0.90
uvian
0.88
azar
0.87
iberal
0.86
icious
0.84
ogical
0.84
oti
0.81
uxe
0.80
Activations Density 0.010%