INDEX
Explanations
words that indicate additional information or related content
New Auto-Interp
Negative Logits
rio
-0.70
Davie
-0.68
Spart
-0.67
tic
-0.66
ly
-0.65
c
-0.65
ity
-0.65
est
-0.65
Ines
-0.64
nin
-0.64
POSITIVE LOGITS
Normdatei
0.92
AsUp
0.88
gså
0.87
ępnie
0.86
кож
0.86
turut
0.85
ALSO
0.84
כן
0.83
וגם
0.82
ValueStyle
0.81
Activations Density 0.138%