INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Ħ¢
-0.83
disgusted
-0.76
unman
-0.65
civilisation
-0.64
VERT
-0.64
Passing
-0.64
ourselves
-0.63
policing
-0.61
AGA
-0.61
traumatic
-0.61
POSITIVE LOGITS
GOODMAN
0.82
Constantin
0.74
fx
0.72
rich
0.72
eous
0.71
Brach
0.71
Hos
0.70
heng
0.69
isi
0.69
hya
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.