INDEX
Explanations
contractions in text
phrases indicating negative actions or consequences
New Auto-Interp
Negative Logits
Gaul
-0.71
Jihad
-0.68
Fuji
-0.62
Jem
-0.61
Juda
-0.61
Quart
-0.60
Skydragon
-0.60
Guards
-0.60
Kus
-0.59
camer
-0.59
POSITIVE LOGITS
tarian
0.97
t
0.91
ember
0.91
vest
0.91
vable
0.89
ude
0.87
agree
0.86
ï¸ı
0.83
tre
0.82
sure
0.82
Activations Density 0.159%