INDEX
Explanations
references to classic works or elements in culture
New Auto-Interp
Negative Logits
earcher
-0.76
reon
-0.72
Simulator
-0.70
allah
-0.68
OTUS
-0.66
arching
-0.66
aughter
-0.66
lesh
-0.65
hani
-0.65
pta
-0.63
POSITIVE LOGITS
arily
0.85
ised
0.81
ists
0.81
Revival
0.80
ist
0.79
ism
0.76
ized
0.74
Japanese
0.74
British
0.73
isation
0.72
Activations Density 0.009%