INDEX
Explanations
references to specific individuals and their associations
New Auto-Interp
Negative Logits
ff
-0.68
ns
-0.68
cc
-0.66
dd
-0.66
f
-0.64
rs
-0.64
ks
-0.64
lt
-0.63
gs
-0.63
nt
-0.62
POSITIVE LOGITS
démocr
0.86
مشين
0.84
vastaan
0.84
énergé
0.82
supérieurs
0.81
colorés
0.81
réguli
0.80
palvel
0.79
normaux
0.78
innamor
0.76
Activations Density 0.374%