INDEX
Explanations
phrases related to negative events or circumstances
colons followed by lists or explanations
New Auto-Interp
Negative Logits
roit
-0.75
Merit
-0.69
plex
-0.69
eware
-0.67
itarian
-0.63
icity
-0.62
ebin
-0.61
yon
-0.61
ities
-0.60
entle
-0.60
POSITIVE LOGITS
namely
1.10
Provided
0.95
Firstly
0.82
Whereas
0.78
http
0.78
↵Âł
0.77
"...
0.77
"'
0.76
"â̦
0.76
https
0.75
Activations Density 0.113%