INDEX
Explanations
positive and negative affirmations or exclamations
expressions of disagreement or alarm
New Auto-Interp
Negative Logits
geries
-0.78
iage
-0.76
satell
-0.75
onite
-0.71
aido
-0.71
ogene
-0.70
decomp
-0.70
wana
-0.69
wagon
-0.68
ijah
-0.67
POSITIVE LOGITS
ï¸ı
1.36
ï¸
0.99
âĢ
0.98
Therefore
0.95
Therefore
0.94
âĢ
0.92
âĶĢâĶĢâĶĢâĶĢ
0.91
¯¯
0.91
Asked
0.90
Then
0.88
Activations Density 0.092%