INDEX
Explanations
phrases related to legality or illegality
negations or words indicating the absence of something
New Auto-Interp
Negative Logits
Jihad
-0.67
Nanto
-0.64
Jem
-0.63
Gaul
-0.63
Crus
-0.63
looms
-0.63
clitor
-0.63
Skydragon
-0.62
mathemat
-0.62
Jinn
-0.61
POSITIVE LOGITS
agree
0.95
ï¸ı
0.92
ï¸
0.90
ever
0.89
sure
0.88
ude
0.87
yet
0.87
emb
0.83
else
0.83
İ
0.83
Activations Density 0.169%