INDEX
Explanations
statements that express opinions, claims, or factual assertions
New Auto-Interp
Negative Logits
tal
-0.64
B
-0.59
the
-0.58
so
-0.58
式
-0.56
sof
-0.56
son
-0.56
sp
-0.56
turn
-0.54
ton
-0.54
POSITIVE LOGITS
')")
0.95
itſelf
0.87
كومونز
0.87
encils
0.87
'%(
0.83
namefont
0.82
дописавши
0.82
Jaunes
0.82
")[
0.81
ingtones
0.81
Activations Density 0.569%