INDEX
Explanations
repeated mentions of the term "chat" and names associated with individuals
New Auto-Interp
Negative Logits
springfox
-0.66
PHeader
-0.62
خصة
-0.60
Formazione
-0.57
Fox
-0.57
ROIT
-0.56
fq
-0.56
ⓧ
-0.54
يتيمه
-0.54
semangat
-0.53
POSITIVE LOGITS
])):
0.81
ori
0.78
Jordan
0.75
")));
0.74
Jordan
0.72
crushes
0.72
]]
0.71
rational
0.71
crush
0.71
"]);
0.71
Activations Density 0.056%