INDEX
Explanations
statements or opinions
expressions of personal or collective viewpoints
New Auto-Interp
Negative Logits
ammy
-0.72
isol
-0.71
glomer
-0.68
andise
-0.67
vernment
-0.66
earthqu
-0.65
dust
-0.63
ilant
-0.63
iott
-0.62
resist
-0.61
POSITIVE LOGITS
finder
1.15
views
0.93
ports
0.87
points
0.86
ĸ
0.82
view
0.81
opian
0.79
port
0.79
viewpoint
0.77
ership
0.77
Activations Density 0.030%