INDEX
Explanations
the presence of specific names and titles associated with roles or organizations
New Auto-Interp
Negative Logits
Beir
-0.78
orb
-0.72
utical
-0.67
HAEL
-0.66
Finn
-0.66
CHAT
-0.66
english
-0.63
hovah
-0.63
Hav
-0.61
charism
-0.61
POSITIVE LOGITS
inct
0.90
ous
0.84
igent
0.80
gence
0.79
ils
0.78
cius
0.76
ts
0.76
gent
0.76
²¾
0.75
ations
0.74
Activations Density 0.005%