INDEX
Explanations
references to religious or biblical objects, practices, or concepts
New Auto-Interp
Negative Logits
suspect
-0.59
rani
-0.58
Rial
-0.56
Vain
-0.56
Amm
-0.55
ivor
-0.54
Duda
-0.54
Wy
-0.53
季
-0.53
dép
-0.52
POSITIVE LOGITS
forehead
1.47
amus
1.06
atri
0.97
Marty
0.90
extrapolated
0.82
extrapolation
0.81
kloped
0.81
Marty
0.79
martyr
0.73
Todd
0.70
Activations Density 0.003%