INDEX
Explanations
invalid or corrupted email addresses
repeated phrases or structures within a text
New Auto-Interp
Negative Logits
ichick
-0.73
swer
-0.70
halla
-0.67
qs
-0.67
Squadron
-0.64
llular
-0.64
purs
-0.64
urat
-0.63
baseline
-0.61
ilaterally
-0.60
POSITIVE LOGITS
Actor
0.81
PORT
0.72
Britain
0.71
BBC
0.70
Sorry
0.68
THIS
0.67
Scotland
0.65
Warning
0.65
ãĤ¨ãĥ«
0.64
TOR
0.63
Activations Density 0.041%