INDEX
Explanations
conditional statements or phrases indicating conditions
New Auto-Interp
Negative Logits
ngth
-0.85
âĸ¬
-0.72
heimer
-0.67
alus
-0.66
vez
-0.66
holm
-0.64
kees
-0.63
perse
-0.63
Mour
-0.62
odore
-0.62
POSITIVE LOGITS
ornia
1.00
unction
0.94
amily
0.90
FER
0.89
TY
0.83
ICA
0.82
FIN
0.82
orce
0.81
rame
0.79
TER
0.79
Activations Density 0.007%