INDEX
Explanations
instances of the word "think" or phrases expressing opinions and thoughts
New Auto-Interp
Negative Logits
ez
-0.17
enis
-0.15
ardo
-0.15
жен
-0.15
MBER
-0.14
飯
-0.14
ptal
-0.14
/wiki
-0.13
caff
-0.13
itol
-0.13
POSITIVE LOGITS
atti
0.18
arris
0.15
able
0.15
باش
0.15
rolls
0.15
ching
0.14
chia
0.14
it
0.14
tank
0.14
enton
0.14
Activations Density 0.037%