INDEX
Explanations
mentions of the word "ny" with a high activation value
the presence of the term "ny" in various contexts
New Auto-Interp
Negative Logits
EMP
-0.83
rador
-0.74
Reviewed
-0.72
ModLoader
-0.69
INGTON
-0.68
ENTION
-0.67
ributes
-0.67
Spread
-0.67
IBLE
-0.66
ENDED
-0.66
POSITIVE LOGITS
mph
0.99
Mellon
0.86
acht
0.85
ny
0.84
Giuliani
0.81
heter
0.79
giene
0.77
cling
0.74
brook
0.73
die
0.72
Activations Density 0.020%