INDEX
Explanations
phrases indicating the significance of an issue or situation, particularly emphasizing whether it is a "big deal" or not
no big deal or harm
New Auto-Interp
Negative Logits
nonUne
-0.60
surla
-0.59
MaterialApp
-0.58
KommentareTeilen
-0.56
Kjelder
-0.55
########.
-0.55
ModelExpression
-0.54
Autoritní
-0.52
createSlice
-0.52
Comprometido
-0.51
POSITIVE LOGITS
harmless
0.63
innoc
0.41
nothing
0.40
insignificant
0.40
没事
0.39
innocently
0.38
problemlos
0.38
Nothing
0.37
Miscellaneous
0.37
没什么
0.37
Activations Density 0.055%