INDEX
Explanations
phrases related to physical and emotional suffering or deprivation
instances of the substring "nes" in various contexts
New Auto-Interp
Negative Logits
ivation
-0.80
arna
-0.72
atories
-0.70
rador
-0.68
masters
-0.67
irlf
-0.65
ivating
-0.64
rament
-0.64
izoph
-0.63
boards
-0.63
POSITIVE LOGITS
earch
0.98
ville
0.94
nes
0.94
cape
0.89
boro
0.88
creen
0.86
erve
0.86
forth
0.84
esis
0.84
burgh
0.82
Activations Density 0.014%