INDEX
Explanations
numbers and measurements
phrases related to urgency or critical processes
New Auto-Interp
Negative Logits
ktop
-0.70
geries
-0.62
terness
-0.60
phal
-0.54
oided
-0.54
nsic
-0.53
zbollah
-0.52
alysed
-0.50
apologised
-0.50
intage
-0.50
POSITIVE LOGITS
âĢ
1.09
âĢ
1.08
ãĢ
1.03
ðŁij
0.97
ÃĤ
0.96
ðŁ
0.93
ðŁ
0.92
ðŁij
0.91
ï¸ı
0.90
âľ
0.90
Activations Density 1.886%