INDEX
Explanations
instances of the word "text"
New Auto-Interp
Negative Logits
CVE
-0.77
CVE
-0.72
rolet
-0.69
vernment
-0.67
MAL
-0.66
ño
-0.66
negie
-0.66
^^^^
-0.63
Kashmir
-0.61
Ern
-0.61
POSITIVE LOGITS
ured
1.40
area
1.14
uring
1.11
uality
1.09
ural
1.08
ures
1.07
urized
1.06
ual
1.00
book
0.97
iles
0.96
Activations Density 0.022%