INDEX
Explanations
expressions related to criticism or recommendations regarding people's behavior and social issues
New Auto-Interp
Negative Logits
">//
-0.65
koli
-0.55
מעט
-0.47
föruts
-0.47
>//
-0.45
PropertyGroup
-0.45
openSession
-0.44
consentimento
-0.44
-0.44
ViewImports
-0.43
POSITIVE LOGITS
stop
1.26
Stop
1.14
shut
1.11
Stop
1.10
STOP
1.08
stop
1.06
Shut
1.06
Shut
1.01
SHUT
1.01
shut
1.01
Activations Density 0.243%